78 research outputs found

    Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

    Get PDF
    Several clustering frameworks with interactive (semi-supervised) queries have been studied in the past. Recently, clustering with same-cluster queries has become popular. An algorithm in this setting has access to an oracle with full knowledge of an optimal clustering, and the algorithm can ask the oracle queries of the form, "Does the optimal clustering put vertices u and v in the same cluster?" Due to its simplicity, this querying model can easily be implemented in real crowd-sourcing platforms and has attracted a lot of recent work. In this paper, we study the popular correlation clustering problem (Bansal et al., 2002) under the same-cluster querying framework. Given a complete graph G=(V,E) with positive and negative edge labels, correlation clustering objective aims to compute a graph clustering that minimizes the total number of disagreements, that is the negative intra-cluster edges and positive inter-cluster edges. In a recent work, Ailon et al. (2018b) provided an approximation algorithm for correlation clustering that approximates the correlation clustering objective within (1+epsilon) with O((k^{14} log{n} log{k})/epsilon^6) queries when the number of clusters, k, is fixed. For many applications, k is not fixed and can grow with |V|. Moreover, the dependency of k^14 on query complexity renders the algorithm impractical even for datasets with small values of k. In this paper, we take a different approach. Let C_{OPT} be the number of disagreements made by the optimal clustering. We present algorithms for correlation clustering whose error and query bounds are parameterized by C_{OPT} rather than by the number of clusters. Indeed, a good clustering must have small C_{OPT}. Specifically, we present an efficient algorithm that recovers an exact optimal clustering using at most 2C_{OPT} queries and an efficient algorithm that outputs a 2-approximation using at most C_{OPT} queries. In addition, we show under a plausible complexity assumption, there does not exist any polynomial time algorithm that has an approximation ratio better than 1+alpha for an absolute constant alpha > 0 with o(C_{OPT}) queries. Therefore, our first algorithm achieves the optimal query bound within a factor of 2. We extensively evaluate our methods on several synthetic and real-world datasets using real crowd-sourced oracles. Moreover, we compare our approach against known correlation clustering algorithms that do not perform querying. In all cases, our algorithms exhibit superior performance

    Renting a Cloud

    Get PDF
    We consider the problem of efficiently scheduling jobs on data centers to minimize the cost of renting machines from "the cloud." In the most basic cloud service model, cloud providers offer computers on demand from large pools installed in data centers. Clients pay for use at an hourly rate. In order to minimize cost, each client needs to decide on the number of machines to be rented and the duration of renting each machine. This suggests the following optimization problem, which we call Rent Minimization. There is a set J={j_1,j_2,...,j_n} of n jobs. Job j_i is released at time r_i >= 0, has a deadline of d_i, and requires p_i>0 contiguous processing time, r_i,d_i,p_i in R. The jobs need to be scheduled on identical parallel machines. Machines may be rented for any length of time; however, the cost of renting a machine for l>=0 time units is [l/D] (the smallest integer >= l/D) dollars, for some given large real D; in particular, one pays dollar 2 whether the machine is rented for D+1 or 2D time units. The goal is to schedule all the jobs in a way that minimizes the incurred rental cost. In this paper, we develop offline and online algorithms for Rent Minimization problem. The algorithms achieve a constant factor approximation for the offline version and O(log(p_max/p_min)) for the online version, where p_max and p_min are the maximum and minimum processing time of the jobs respectively. We also show that no deterministic online algorithm can achieve an approximation factor better than log_{3}(p_max/p_min) within a constant factor. Both of these algorithms use the well-studied problem of Machine Minimization as a subroutine. Machine Minimization is a special case of Rent Minimization where D = max_{i}{d_i}. In the process of solving the Rent Minimization problem, in this paper, we also develop the first online algorithm for Machine Minimization

    Energy Efficient Scheduling via Partial Shutdown

    Get PDF
    Motivated by issues of saving energy in data centers we define a collection of new problems referred to as "machine activation" problems. The central framework we introduce considers a collection of mm machines (unrelated or related) with each machine ii having an {\em activation cost} of aia_i. There is also a collection of nn jobs that need to be performed, and pi,jp_{i,j} is the processing time of job jj on machine ii. We assume that there is an activation cost budget of AA -- we would like to {\em select} a subset SS of the machines to activate with total cost a(S)Aa(S) \le A and {\em find} a schedule for the nn jobs on the machines in SS minimizing the makespan (or any other metric). For the general unrelated machine activation problem, our main results are that if there is a schedule with makespan TT and activation cost AA then we can obtain a schedule with makespan \makespanconstant T and activation cost \costconstant A, for any ϵ>0\epsilon >0. We also consider assignment costs for jobs as in the generalized assignment problem, and using our framework, provide algorithms that minimize the machine activation and the assignment cost simultaneously. In addition, we present a greedy algorithm which only works for the basic version and yields a makespan of 2T2T and an activation cost A(1+lnn)A (1+\ln n). For the uniformly related parallel machine scheduling problem, we develop a polynomial time approximation scheme that outputs a schedule with the property that the activation cost of the subset of machines is at most AA and the makespan is at most (1+ϵ)T(1+\epsilon) T for any ϵ>0\epsilon >0
    corecore